<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>20. Structured Markup Processing Tools &mdash; Python v2.6.2 documentation</title>
    <link rel="stylesheet" href="../_static/default.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../',
        VERSION:     '2.6.2',
        COLLAPSE_MODINDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <link rel="search" type="application/opensearchdescription+xml"
          title="Search within Python v2.6.2 documentation"
          href="../_static/opensearch.xml"/>
    <link rel="author" title="About these documents" href="../about.html" />
    <link rel="copyright" title="Copyright" href="../copyright.html" />
    <link rel="top" title="Python v2.6.2 documentation" href="../index.html" />
    <link rel="up" title="The Python Standard Library" href="index.html" />
    <link rel="next" title="20.1. HTMLParser — Simple HTML and XHTML parser" href="htmlparser.html" />
    <link rel="prev" title="19.16. uu — Encode and decode uuencode files" href="uu.html" />
    <link rel="shortcut icon" type="image/png" href="../_static/py.png" />
 

  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="../modindex.html" title="Global Module Index"
             accesskey="M">modules</a> |</li>
        <li class="right" >
          <a href="htmlparser.html" title="20.1. HTMLParser — Simple HTML and XHTML parser"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="uu.html" title="19.16. uu — Encode and decode uuencode files"
             accesskey="P">previous</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="../index.html">Python v2.6.2 documentation</a> &raquo;</li>

          <li><a href="index.html" accesskey="U">The Python Standard Library</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="structured-markup-processing-tools">
<span id="markup"></span><h1>20. Structured Markup Processing Tools<a class="headerlink" href="#structured-markup-processing-tools" title="Permalink to this headline">¶</a></h1>
<p>Python supports a variety of modules to work with various forms of structured
data markup.  This includes modules to work with the Standard Generalized Markup
Language (SGML) and the Hypertext Markup Language (HTML), and several interfaces
for working with the Extensible Markup Language (XML).</p>
<p>It is important to note that modules in the <tt class="xref docutils literal"><span class="pre">xml</span></tt> package require that
there be at least one SAX-compliant XML parser available. Starting with Python
2.3, the Expat parser is included with Python, so the <a title="An interface to the Expat non-validating XML parser." class="reference external" href="pyexpat.html#module-xml.parsers.expat"><tt class="xref docutils literal"><span class="pre">xml.parsers.expat</span></tt></a>
module will always be available. You may still want to be aware of the <a class="reference external" href="http://pyxml.sourceforge.net/">PyXML
add-on package</a>; that package provides an
extended set of XML libraries for Python.</p>
<p>The documentation for the <a title="Document Object Model API for Python." class="reference external" href="xml.dom.html#module-xml.dom"><tt class="xref docutils literal"><span class="pre">xml.dom</span></tt></a> and <a title="Package containing SAX2 base classes and convenience functions." class="reference external" href="xml.sax.html#module-xml.sax"><tt class="xref docutils literal"><span class="pre">xml.sax</span></tt></a> packages are the
definition of the Python bindings for the DOM and SAX interfaces.</p>
<ul>
<li class="toctree-l1"><a class="reference external" href="htmlparser.html">20.1. <tt class="docutils literal"><span class="pre">HTMLParser</span></tt> &#8212; Simple HTML and XHTML parser</a><ul>
<li class="toctree-l2"><a class="reference external" href="htmlparser.html#example-html-parser-application">20.1.1. Example HTML Parser Application</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="sgmllib.html">20.2. <tt class="docutils literal"><span class="pre">sgmllib</span></tt> &#8212; Simple SGML parser</a></li>
<li class="toctree-l1"><a class="reference external" href="htmllib.html">20.3. <tt class="docutils literal"><span class="pre">htmllib</span></tt> &#8212; A parser for HTML documents</a><ul>
<li class="toctree-l2"><a class="reference external" href="htmllib.html#htmlparser-objects">20.3.1. HTMLParser Objects</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="htmllib.html#module-htmlentitydefs">20.4. <tt class="docutils literal"><span class="pre">htmlentitydefs</span></tt> &#8212; Definitions of HTML general entities</a></li>
<li class="toctree-l1"><a class="reference external" href="pyexpat.html">20.5. <tt class="docutils literal"><span class="pre">xml.parsers.expat</span></tt> &#8212; Fast XML parsing using Expat</a><ul>
<li class="toctree-l2"><a class="reference external" href="pyexpat.html#xmlparser-objects">20.5.1. XMLParser Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="pyexpat.html#expaterror-exceptions">20.5.2. ExpatError Exceptions</a></li>
<li class="toctree-l2"><a class="reference external" href="pyexpat.html#example">20.5.3. Example</a></li>
<li class="toctree-l2"><a class="reference external" href="pyexpat.html#content-model-descriptions">20.5.4. Content Model Descriptions</a></li>
<li class="toctree-l2"><a class="reference external" href="pyexpat.html#expat-error-constants">20.5.5. Expat error constants</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="xml.dom.html">20.6. <tt class="docutils literal"><span class="pre">xml.dom</span></tt> &#8212; The Document Object Model API</a><ul>
<li class="toctree-l2"><a class="reference external" href="xml.dom.html#module-contents">20.6.1. Module Contents</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.dom.html#objects-in-the-dom">20.6.2. Objects in the DOM</a><ul>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#domimplementation-objects">20.6.2.1. DOMImplementation Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#node-objects">20.6.2.2. Node Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#nodelist-objects">20.6.2.3. NodeList Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#documenttype-objects">20.6.2.4. DocumentType Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#document-objects">20.6.2.5. Document Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#element-objects">20.6.2.6. Element Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#attr-objects">20.6.2.7. Attr Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#namednodemap-objects">20.6.2.8. NamedNodeMap Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#comment-objects">20.6.2.9. Comment Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#text-and-cdatasection-objects">20.6.2.10. Text and CDATASection Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#processinginstruction-objects">20.6.2.11. ProcessingInstruction Objects</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#exceptions">20.6.2.12. Exceptions</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference external" href="xml.dom.html#conformance">20.6.3. Conformance</a><ul>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#type-mapping">20.6.3.1. Type Mapping</a></li>
<li class="toctree-l3"><a class="reference external" href="xml.dom.html#accessor-methods">20.6.3.2. Accessor Methods</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="xml.dom.minidom.html">20.7. <tt class="docutils literal"><span class="pre">xml.dom.minidom</span></tt> &#8212; Lightweight DOM implementation</a><ul>
<li class="toctree-l2"><a class="reference external" href="xml.dom.minidom.html#dom-objects">20.7.1. DOM Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.dom.minidom.html#dom-example">20.7.2. DOM Example</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.dom.minidom.html#minidom-and-the-dom-standard">20.7.3. minidom and the DOM standard</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="xml.dom.pulldom.html">20.8. <tt class="docutils literal"><span class="pre">xml.dom.pulldom</span></tt> &#8212; Support for building partial DOM trees</a><ul>
<li class="toctree-l2"><a class="reference external" href="xml.dom.pulldom.html#domeventstream-objects">20.8.1. DOMEventStream Objects</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="xml.sax.html">20.9. <tt class="docutils literal"><span class="pre">xml.sax</span></tt> &#8212; Support for SAX2 parsers</a><ul>
<li class="toctree-l2"><a class="reference external" href="xml.sax.html#saxexception-objects">20.9.1. SAXException Objects</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="xml.sax.handler.html">20.10. <tt class="docutils literal"><span class="pre">xml.sax.handler</span></tt> &#8212; Base classes for SAX handlers</a><ul>
<li class="toctree-l2"><a class="reference external" href="xml.sax.handler.html#contenthandler-objects">20.10.1. ContentHandler Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.sax.handler.html#dtdhandler-objects">20.10.2. DTDHandler Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.sax.handler.html#entityresolver-objects">20.10.3. EntityResolver Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.sax.handler.html#errorhandler-objects">20.10.4. ErrorHandler Objects</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="xml.sax.utils.html">20.11. <tt class="docutils literal"><span class="pre">xml.sax.saxutils</span></tt> &#8212; SAX Utilities</a></li>
<li class="toctree-l1"><a class="reference external" href="xml.sax.reader.html">20.12. <tt class="docutils literal"><span class="pre">xml.sax.xmlreader</span></tt> &#8212; Interface for XML parsers</a><ul>
<li class="toctree-l2"><a class="reference external" href="xml.sax.reader.html#xmlreader-objects">20.12.1. XMLReader Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.sax.reader.html#incrementalparser-objects">20.12.2. IncrementalParser Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.sax.reader.html#locator-objects">20.12.3. Locator Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.sax.reader.html#inputsource-objects">20.12.4. InputSource Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.sax.reader.html#the-attributes-interface">20.12.5. The <tt class="docutils literal"><span class="pre">Attributes</span></tt> Interface</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.sax.reader.html#the-attributesns-interface">20.12.6. The <tt class="docutils literal"><span class="pre">AttributesNS</span></tt> Interface</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference external" href="xml.etree.elementtree.html">20.13. <tt class="docutils literal"><span class="pre">xml.etree.ElementTree</span></tt> &#8212; The ElementTree XML API</a><ul>
<li class="toctree-l2"><a class="reference external" href="xml.etree.elementtree.html#functions">20.13.1. Functions</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.etree.elementtree.html#the-element-interface">20.13.2. The Element Interface</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.etree.elementtree.html#elementtree-objects">20.13.3. ElementTree Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.etree.elementtree.html#qname-objects">20.13.4. QName Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.etree.elementtree.html#treebuilder-objects">20.13.5. TreeBuilder Objects</a></li>
<li class="toctree-l2"><a class="reference external" href="xml.etree.elementtree.html#xmltreebuilder-objects">20.13.6. XMLTreeBuilder Objects</a></li>
</ul>
</li>
</ul>
<div class="admonition-see-also admonition seealso">
<p class="first admonition-title">See also</p>
<dl class="last docutils">
<dt><a class="reference external" href="http://pyxml.sourceforge.net/">Python/XML Libraries</a></dt>
<dd>Home page for the PyXML package, containing an extension of <tt class="xref docutils literal"><span class="pre">xml</span></tt> package
bundled with Python.</dd>
</dl>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
            <h4>Previous topic</h4>
            <p class="topless"><a href="uu.html"
                                  title="previous chapter">19.16. <tt class="docutils literal"><span class="pre">uu</span></tt> &#8212; Encode and decode uuencode files</a></p>
            <h4>Next topic</h4>
            <p class="topless"><a href="htmlparser.html"
                                  title="next chapter">20.1. <tt class="docutils literal docutils literal"><span class="pre">HTMLParser</span></tt> &#8212; Simple HTML and XHTML parser</a></p>
            <h3>This Page</h3>
            <ul class="this-page-menu">
              <li><a href="../_sources/library/markup.txt"
                     rel="nofollow">Show Source</a></li>
            </ul>
          <div id="searchbox" style="display: none">
            <h3>Quick search</h3>
              <form class="search" action="../search.html" method="get">
                <input type="text" name="q" size="18" />
                <input type="submit" value="Go" />
                <input type="hidden" name="check_keywords" value="yes" />
                <input type="hidden" name="area" value="default" />
              </form>
              <p class="searchtip" style="font-size: 90%">
              Enter search terms or a module, class or function name.
              </p>
          </div>
          <script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="../modindex.html" title="Global Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="htmlparser.html" title="20.1. HTMLParser — Simple HTML and XHTML parser"
             >next</a> |</li>
        <li class="right" >
          <a href="uu.html" title="19.16. uu — Encode and decode uuencode files"
             >previous</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="../index.html">Python v2.6.2 documentation</a> &raquo;</li>

          <li><a href="index.html" >The Python Standard Library</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
      &copy; <a href="../copyright.html">Copyright</a> 1990-2009, Python Software Foundation.
      Last updated on Apr 15, 2009.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 0.6.1.
    </div>
  </body>
</html>